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- The MAILING DATE of this communication appears on the cover sheet with the correspondence address - 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1 .1 36(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If the period for reply specified above is less than thirty (30) days, a reply within the statutory minimum of thirty (30) days will be considered timely. 

- If NO period for reply is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 1 33). 

- Any reply received by the Office later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1 .704(b). 

Status 

1 )0 Responsive to communication(s) filed on . 

2a)Q This action is FINAL. 2b)£3 This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 1 1 , 453 O.G. 213. 
Disposition of Claims 

4) ^ Claim(s) 1-42 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed. 

6) ^ Claim(s) 1-42 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) [X] The specification is objected to by the Examiner. 

10) E3 The drawing(s) filed on 31 July 2000 is/are: a)E3 accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 

1 1) D The proposed drawing correction filed on is: a)Q approved b)D disapproved by the Examiner. 

If approved, corrected drawings are required in reply to this Office action. 

12) D The oath or declaration is objected to by the Examiner. 
Priority under 35 U.S.C. §§119 and 120 

1 3) D Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 

a)n All b)Q Some*c)D None of: 

1 .D Certified copies of the priority documents have been received. 

2.D Certified copies of the priority documents have been received in Application No. . 



3.D Copies of the certified copies of the priority documents have been received in this National Stage 
application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 

14) Q Acknowledgment is made of a claim for domestic priority under 35 U.S.C. § 1 19(e) (to a provisional application). 

a) □ The translation of the foreign language provisional application has been received. 

15) D Acknowledgment is made of a claim for domestic priority under 35 U.S.C. §§ 120 and/or 121. i/[Jjl_ 



Attachment(s) 

1) ^3 Notice of References Cited (PTO-892) 

2) O Notice of Draftsperson's Patent Drawing Review (PTO-948) 

3) ^ Information Disclosure Statement(s) (PTO-1449) Paper No(s) 2 . 



4) □ Interview Summary (PTO-41 3) Paper No(s). 

5) □ Notice of Informal Patent Application (PTO-1 52) 

6) Q Other: 



U.S. Patent and Trademark Office 
PTO-326 (Rev. 04-01) 
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DETAILED ACTION 
Specification 

1 . The disclosure is objected to because it contains embedded hyperlinks and/or 
other form of browser-executable code for example at page 1 , line 14. Applicant is 
required to review the whole specification and delete the embedded hyperlinks and/or 
other forms of browser-executable code. See MPEP § 608.01 . 

Claim Rejections - 35 USC §112 
The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

2. Claim 10 is rejected under 35 U.S.C. 112, second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention because the term "very" in claim 10 is a relative term which 
renders the claim indefinite. The term "very" is not defined by the claim, the 
specification does not provide a standard for ascertaining the requisite degree, and one 
of ordinary skill in the art would not be reasonably apprised of the scope of the 
invention. 

The art rejection of claim 10 is applied as best understood in light of the rejection 
under 35 U.S.C. 112, second paragraph discussed above. 

Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 
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(e) the invention was described in a patent granted on an application for patent by another filed in the 
United States before the invention thereof by the applicant for patent, or on an international application 
by another who has fulfilled the requirements of paragraphs (1 ), (2), and (4) of section 371 (c) of this 
title before the invention thereof by the applicant for patent. 

The changes made to 35 U.S.C. 102(e) by the American Inventors Protection Act 
of 1999 (AIPA) do not apply to the examination of this application as the application 
being examined was not (1) filed on or after November 29, 2000, or (2) voluntarily 
published under 35 U.S.C. 122(b). Therefore, this application is examined under 35 
U.S.C. 102(e) prior to the amendment by the AIPA (pre-AlPA 35 U.S.C. 102(e)). 

3. Claims 1-6, 8, 10-14, 19-21, 23-42 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Aiken (US 6,240,409). 

Regarding claim 1, Aiken discloses a method for detecting similar documents 
including all the claimed subject matter (see Figures 1a, b, column 3 lines 44-47). Note 
the step of obtaining a document 102, filtering the document 106. The claimed step of 
generating a tuple for the filtered document is met by the fact that a hash value and 
position pair is created and stored (see step 1 14, column 6, lines 7-28). The tuple is 
clearly compared with a plurality of tuples as claimed. Aiken discloses detecting if the 
document is similar to another document by determining if the tuple is clustered with 
another tuple in the document storage structured (see Figures 4a, 4b, 4c, column 7, 
lines 25-34, column 10, line 4- column 12, line 2). 

Regarding claim 2, Aiken discloses parsing and filtering the document (see 
column 4, lines 54-67). Clearly the filtered document comprises a token stream of a 
plurality of tokens as claimed. 
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Regarding claim 3, Aiken discloses retaining a token according to at least a token 
threshold (see column 11, lines 15-30). 

Regarding claim 4, Aiken discloses that the retained tokens are arranged in the 
token stream (see Figure 4a, step 404). 

Regarding claim 5, Aiken discloses determining the hash value for the filtered 
document by processing individually each retained token in the token stream (see 
column 6, lines 7-28, column 9, lines 24-26). 

Regarding claim 6, Aiken discloses determining a score for each token in the 
token stream and comparing the score for each token to a first token threshold (see 
column 11, lines 15-30). The token stream is clearly modified by removing each token 
having a score not satisfying the first token threshold and retaining each token having a 
score satisfying the first token threshold as claimed since the document not containing a 
certain match ratio is discarded in the method of Aiken. 

Regarding claim 8, Aiken discloses filtering by removing from the token stream at 
least one token corresponding to a stop word (see column 4, lines 57-58, column 8, line 
67- column 9, line 3). 

Regarding claim 10, Aiken discloses removing a token from a token stream if the 
token is a very frequent token when Aiken shows that the method remove words of "the" 
"and" , "this", "is" (see column 4, lines 57-58, column 8, line 67- column 9, line 3). 

Regarding claim 11, Aiken discloses removing a token from a token stream (see 
column 4, lines 57-58, column 8, line 67- column 9, line 3). 
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Regarding claim 12, Aiken discloses removing formatting from the document 
(see column 4, lines 55-57). 

Regarding claims 13, 14, clearly the method of Aiken uses collection statistics 
pertaining to a plurality of documents for filtering the document since the input file is 
compared to a set of collected files to detect similarity (see column 2, lines 47-51 ). The 
collection statistics have to be present for the collected documents to be clustered as 
shown in the method of Aiken (see Figure 4c, column 1 1 , line 47- column 12, line 2). 

Regarding claim 19, Aiken discloses a hash table (see column 12, lines 40-44). 

Regarding claim 20, Aiken discloses that the document storage structure 
comprises a tree (see column 8, lines 30-38). 

Regarding claim 21 , Aiken discloses that the tree comprises a binary tree (see 
column 8, lines 36-38). 

Regarding claim 23, Aiken discloses a hash table and at least one tree (see 
column 5, lines 33-40, column 8, lines 30-38). 

Regarding claim 24, Aiken discloses inserting the tuple into the document 
storage structure (see Figure 1a, 1b, 4a, 4b, 4c). 

Regarding claim 25, the hash table of Aiken clearly comprises a plurality of bins 
of tuples as claimed and the step of determining if the tuple is clustered with another 
tuple clearly comprise determining if the tuple is co-located with another tuple at a bin of 
a hash table (see Figures 1 , 2, 4c, column 7, line 46- column 8, line 33). 

Regarding claim 26, Aiken discloses a tree comprising a plurality of branches, 
each bucket of the tree comprising at least one tuple and wherein the step of 



• 
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determining if the tuple is clustered with another tuple clearly comprise determining if 
the tuple is co-located with another tuple in a bucket of the tree (see column 8, lines 31- 
54, Figure 4c). 

Claims 27, 29 correspond to a system to perform the method of claim 1 , thus are 
rejected for the same reasons stated in claim 1 above. 

Claim 28 corresponds to a computer program product to perform the method of 
claim 1 , thus is rejected for the same reasons stated in claim 1 above. 

Claim 30 is a mere combination of claims 1-4, 26, thus is rejected for the same 
reasons stated in claims 1-4, 26 above. 

Claims 31, 33 correspond to a system to perform the method of claim 30, thus 
are rejected for the same reasons stated in claim 30 above. 

Claim 32 corresponds to a computer program product to perform the method of 
claim 30, thus is rejected for the same reasons stated in claim 30 above. 

Regarding claim 34, Aiken discloses all the claimed subject matter including 
determining a hash value for a document (see Figure 1 , column 4, line 1 7- column 7, 
line 45, column 9, lines 16-30), accessing a document storage structure comprising a 
plurality of hash values, each hash value representing one of a plurality of documents 
(see Figure 4a, column 10, line 4- column 11, line 46), determining if the hash value is 
equivalent to another hash value in the document storage structure (see Figure 4c, 
column 1 1 , line 47- column 1 2, line 2). 

Claims 35, 37 correspond to a system to perform the method of claim 30, thus 
are rejected for the same reasons stated in claim 34 above. 
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Claim 36 corresponds to a computer program product to perform the method of 
claim 30, thus is rejected for the same reasons stated in claim 34 above. 

Regarding claim 38, Aiken discloses a method for detecting similar documents 
including comparing a document to a plurality of documents in a document collection 
using a hash algorithm and collection statistics to detect if the document is similar to any 
of the documents in the document collection (see Figures 1a, b, 4a, 4b, 4c). The 
claimed collection statistics have to be present and used in the method of Aiken for the 
similar documents to be clustered (see Figure 4c, column 1 1 , line 47- column 12, line 



Regarding claim 39, clearly the collection statistics pertain to the document 
collection since the statistics are used in clustering the collected documents (see Figure 



Claims 40, 42 correspond to a system to perform the method of claim 38, thus 
are rejected for the same reasons stated in claim 38 above. 

Claim 41 corresponds to a computer program product to perform the method of 
claim 38, thus is rejected for the same reasons stated in claim 38 above. 



The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 



2). 



4c). 



Claim Rejections - 35 USC § 103 
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4. Claims 7, 9, 15-18, 22 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Aiken (US 6,240,409). 

Regarding claim 7, although Aiken does not specifically show the step of 
comparing the score for each retained token to a second token threshold and modifying 
the token stream as claimed, Aiken explicitly show that not every substring's hash value 
is stored (see column 6, lines 29-30). Therefore, it would have been obvious to one of 
ordinary skill in the art to include the claimed feature while implementing the method 
taught by Aiken in order to further filter the document and save memory. 

Regarding claim 9, although Aiken does not explicitly disclose filtering by 
removing a duplicate of another token in the token stream, it would have been obvious 
to one of ordinary skill in the art to include such a feature in order to avoid processing 
redundant token, thus saving time and resources. 

Regarding claims 1 5-1 8, although Aiken does not explicitly show that the method 
uses specific hash algorithms as claimed, it is notoriously well known in the art to use 
different hash algorithms depending on users' requirements. Therefore, it would have 
been obvious to one of ordinary skill in the art to include all the claimed features while 
implementing the method of Aiken in order to suit users' needs. 

Regarding claim 22, although Aiken does not explicitly show that the binary tree 
is balanced, it would have been obvious to one of ordinary skill in the art to include such 
a feature while implementing the method of Aiken in order to store data efficiently and to 
facilitate searching and localization. 



Conclusion 
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5. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Broder et al (US 6,1 19,124 and US 6,349,296) teach a method for clustering closely 
resembling data objects. 

6. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Uyen T Le whose telephone number is 703-305-4134. 
The examiner can normally be reached on M-F 7:00-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Safet Metjahic can be reached on 703-308-1436. The fax phone numbers 
for the organization where this application or proceeding is assigned are 703-746-7239 
for regular communications and 703-746-7238 for After Final communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 703-305- 
3900. 




Uyen Le 

October 21, 2002 



